A General Framework for Biclustering Gene Expression Data

نویسندگان

  • Haifeng Li
  • Xin Chen
  • Keshu Zhang
  • Tao Jiang
چکیده

A large number of biclustering methods have been proposed to detect patterns in gene expression data. All these methods try to find some type of biclusters but no one can discover all the types of patterns in the data. Furthermore, researchers have to design new algorithms in order to find new types of biclusters/patterns that interest biologists. In this paper, we propose a novel approach for biclustering that, in general, can be used to discover all computable patterns in gene expression data. The method is based on the theory of Kolmogorov complexity. More precisely, we use Kolmogorov complexity to measure the randomness of submatrices as the merit of biclusters because randomness naturally consists in a lack of regularity, which is a common property of all types of patterns. On the basis of algorithmic probability measure, we develop a Markov Chain Monte Carlo algorithm to search for biclusters. Our method can also be easily extended to solve the problems of conventional clustering and checkerboard type biclustering. The preliminary experiments on simulated as well as real data show that our approach is very versatile and promising.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Latent Feature Model Approach to Biclustering

Biclustering is the unsupervised learning task of mining a data matrix for useful submatrices, for instance groups of genes that are co-expressed under particular biological conditions. As these submatrices are expected to partly overlap, a significant challenge in biclustering is to develop methods that are able to detect overlapping biclusters. The authors propose a probabilistic mixture mode...

متن کامل

UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data.

Biclustering algorithms, which aim to provide an effective and efficient way to analyze gene expression data by finding a group of genes with trend-preserving expression patterns under certain conditions, have been widely developed since Morgan et al. pioneered a work about partitioning a data matrix into submatrices with approximately constant values. However, the identification of general tre...

متن کامل

به کارگیری خوشه‌بندی دوبعدی با روش «زیرماتریس‌های با میانگین- درایه‌های بزرگ» در داده‌های بیان ژنی حاصل از ریزآرایه‌های DNA

Background and Objective: In recent years, DNA microarray technology has become a central tool in genomic research. Using this technology, which made it possible to simultaneously analyze expression levels for thousands of genes under different conditions, massive amounts of information will be obtained. While traditional clustering methods, such as hierarchical and K-means clustering have been...

متن کامل

BiRange:An Efficient Framework for Biclustering of Gene Expression Data Using Range Bipartite Graph

Biclustering is a vital data mining tool which is commonly employed on microarray data sets for analysis task in bioinformat ics research and medical applications. There has been extensive research on biclustering of gene expression data arising from microarray experiment. This technique is an important analysis tool in gene expression measurement, when some genes have multip le functions and e...

متن کامل

QUBIC: a qualitative biclustering algorithm for analyses of gene expression data

Biclustering extends the traditional clustering techniques by attempting to find (all) subgroups of genes with similar expression patterns under to-be-identified subsets of experimental conditions when applied to gene expression data. Still the real power of this clustering strategy is yet to be fully realized due to the lack of effective and efficient algorithms for reliably solving the genera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of bioinformatics and computational biology

دوره 4 4  شماره 

صفحات  -

تاریخ انتشار 2006